The search functionality is under construction.

Keyword Search Result

[Keyword] neural network(855hit)

221-240hit(855hit)

  • Speech Quality Enhancement for In-Ear Microphone Based on Neural Network

    Hochong PARK  Yong-Shik SHIN  Seong-Hyeon SHIN  

     
    LETTER-Speech and Hearing

      Pubricized:
    2019/05/15
      Vol:
    E102-D No:8
      Page(s):
    1594-1597

    Speech captured by an in-ear microphone placed inside an occluded ear has a high signal-to-noise ratio; however, it has different sound characteristics compared to normal speech captured through air conduction. In this study, a method for blind speech quality enhancement is proposed that can convert speech captured by an in-ear microphone to one that resembles normal speech. The proposed method estimates an input-dependent enhancement function by using a neural network in the feature domain and enhances the captured speech via time-domain filtering. Subjective and objective evaluations confirm that the speech enhanced using our proposed method sounds more similar to normal speech than that enhanced using conventional equalizer-based methods.

  • Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech

    Kentaro SONE  Toru NAKASHIKA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2019/05/15
      Vol:
    E102-D No:8
      Page(s):
    1546-1553

    Conventional approaches to statistical parametric speech synthesis use context-dependent hidden Markov models (HMMs) clustered using decision trees to generate speech parameters from linguistic features. However, decision trees are not always appropriate to model complex context dependencies of linguistic features efficiently. An alternative scheme that replaces decision trees with deep neural networks (DNNs) was presented as a possible way to overcome the difficulty. By training the network to represent high-dimensional feedforward dependencies from linguistic features to acoustic features, DNN-based speech synthesis systems convert a text into a speech. To improved the naturalness of the synthesized speech, this paper presents a novel pre-training method for DNN-based statistical parametric speech synthesis systems. In our method, a deep relational model (DRM), which represents a joint probability of two visible variables, is applied to describe the joint distribution of acoustic and linguistic features. As with DNNs, a DRM consists several hidden layers and two visible layers. Although DNNs represent feedforward dependencies from one visible variables (inputs) to other visible variables (outputs), a DRM has an ability to represent the bidirectional dependencies between two visible variables. During the maximum-likelihood (ML) -based training, the model optimizes its parameters (connection weights between two adjacent layers, and biases) of a deep architecture considering the bidirectional conversion between 1) acoustic features given linguistic features, and 2) linguistic features given acoustic features generated from itself. Owing to considering whether the generated acoustic features are recognizable, our method can obtain reasonable parameters for speech synthesis. Experimental results in a speech synthesis task show that pre-trained DNN-based systems using our proposed method outperformed randomly-initialized DNN-based systems, especially when the amount of training data is limited. Additionally, speaker-dependent speech recognition experimental results also show that our method outperformed DNN-based systems, by setting the initial parameters of our method are the same as that in the synthesis experiments.

  • Recognition of Anomalously Deformed Kana Sequences in Japanese Historical Documents

    Nam Tuan LY  Kha Cong NGUYEN  Cuong Tuan NGUYEN  Masaki NAKAGAWA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2019/05/07
      Vol:
    E102-D No:8
      Page(s):
    1554-1564

    This paper presents recognition of anomalously deformed Kana sequences in Japanese historical documents, for which a contest was held by IEICE PRMU 2017. The contest was divided into three levels in accordance with the number of characters to be recognized: level 1: single characters, level 2: sequences of three vertically written Kana characters, and level 3: unrestricted sets of characters composed of three or more characters possibly in multiple lines. This paper focuses on the methods for levels 2 and 3 that won the contest. We basically follow the segmentation-free approach and employ the hierarchy of a Convolutional Neural Network (CNN) for feature extraction, Bidirectional Long Short-Term Memory (BLSTM) for frame prediction, and Connectionist Temporal Classification (CTC) for text recognition, which is named a Deep Convolutional Recurrent Network (DCRN). We compare the pretrained CNN approach and the end-to-end approach with more detailed variations for level 2. Then, we propose a method of vertical text line segmentation and multiple line concatenation before applying DCRN for level 3. We also examine a two-dimensional BLSTM (2DBLSTM) based method for level 3. We present the evaluation of the best methods by cross validation. We achieved an accuracy of 89.10% for the three-Kana-character sequence recognition and an accuracy of 87.70% for the unrestricted Kana recognition without employing linguistic context. These results prove the performances of the proposed models on the level 2 and 3 tasks.

  • TDCTFIC: A Novel Recommendation Framework Fusing Temporal Dynamics, CNN-Based Text Features and Item Correlation

    Meng Ting XIONG  Yong FENG  Ting WU  Jia Xing SHANG  Bao Hua QIANG  Ya Nan WANG  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2019/05/14
      Vol:
    E102-D No:8
      Page(s):
    1517-1525

    The traditional recommendation system (RS) can learn the potential personal preferences of users and potential attribute characteristics of items through the rating records between users and items to make recommendations.However, for the new items with no historical rating records,the traditional RS usually suffers from the typical cold start problem. Additional auxiliary information has usually been used in the item cold start recommendation,we further bring temporal dynamics,text and relevance in our models to release item cold start.Two new cold start recommendation models TmTx(Time,Text) and TmTI(Time,Text,Item correlation) proposed to solve the item cold start problem for different cold start scenarios.While well-known methods like TimeSVD++ and CoFactor partially take temporal dynamics,comments,and item correlations into consideration to solve the cold start problem but none of them combines these information together.Two models proposed in this paper fused features such as time,text,and relevance can effectively improve the performance under item cold start.We select the convolutional neural network (CNN) to extract features from item description text which provides the model the ability to deal with cold start items.Both proposed models can effectively improve the performance with item cold start.Experimental results on three real-world data set show that our proposed models lead to significant improvement compared with the baseline methods.

  • A ReRAM-Based Row-Column-Oriented Memory Architecture for Convolutional Neural Networks

    Yan CHEN  Jing ZHANG  Yuebing XU  Yingjie ZHANG  Renyuan ZHANG  Yasuhiko NAKASHIMA  

     
    BRIEF PAPER

      Vol:
    E102-C No:7
      Page(s):
    580-584

    An efficient resistive random access memory (ReRAM) structure is developed for accelerating convolutional neural network (CNN) powered by the in-memory computation. A novel ReRAM cell circuit is designed with two-directional (2-D) accessibility. The entire memory system is organized as a 2-D array, in which specific memory cells can be identically accessed by both of column- and row-locality. For the in-memory computations of CNNs, only relevant cells in an identical sub-array are accessed by 2-D read-out operations, which is hardly implemented by conventional ReRAM cells. In this manner, the redundant access (column or row) of the conventional ReRAM structures is prevented to eliminated the unnecessary data movement when CNNs are processed in-memory. From the simulation results, the energy and bandwidth efficiency of the proposed memory structure are 1.4x and 5x of a state-of-the-art ReRAM architecture, respectively.

  • Webly-Supervised Food Detection with Foodness Proposal Open Access

    Wataru SHIMODA  Keiji YANAI  

     
    PAPER

      Pubricized:
    2019/04/25
      Vol:
    E102-D No:7
      Page(s):
    1230-1239

    To minimize the annotation costs associated with training semantic segmentation models and object detection models, weakly supervised detection and weakly supervised segmentation approaches have been extensively studied. However most of these approaches assume that the domain between training and testing is the same, which at times results in considerable performance drops. For example, if we train an object detection network using only web images showing a large object at the center, it can be difficult for the network to detect multiple small objects. In this paper, we focus on training a CNN with only web images and achieve object detection in the wild. A proposal-based approach can address the problem associated with differences in domains because web images are similar to images of the proposal. In both domains, the target object is located at the center of the image and the ratio of the size of the target object to the size of the image is large. Several proposal methods have been proposed to detect regions with high “object-ness.” However, many of these proposals generate a large number of candidates to increase the recall rate. Considering the recent advent of deep CNNs, methods that generate a large number of proposals exhibit problems in terms of processing time for practical use. Therefore, we propose a CNN-based “food-ness” proposal method in this paper that requires neither pixel-wise annotation nor bounding box annotation. Our method generates proposals through backpropagation and most of these proposals focus only on food objects. In addition, we can easily control the number of proposals. Through experiments, we trained a network model using only web images and tested the model on the UEC FOOD 100 dataset. We demonstrate that the proposed method achieves high performance compared to traditional proposal methods in terms of the trade-off between accuracy and computational cost. Therefore, in this paper, we propose an intermediate approach between the traditional proposal approach and the fully convolutional approach. In particular, we propose a novel proposal method that generates high“food-ness” regions using fully convolutional networks based on the backward approach by training food images gathered from the web.

  • Using Temporal Correlation to Optimize Stereo Matching in Video Sequences

    Ming LI  Li SHI  Xudong CHEN  Sidan DU  Yang LI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2019/03/01
      Vol:
    E102-D No:6
      Page(s):
    1183-1196

    The large computational complexity makes stereo matching a big challenge in real-time application scenario. The problem of stereo matching in a video sequence is slightly different with that in a still image because there exists temporal correlation among video frames. However, no existing method considered temporal consistency of disparity for algorithm acceleration. In this work, we proposed a scheme called the dynamic disparity range (DDR) to optimize matching cost calculation and cost aggregation steps by narrowing disparity searching range, and a scheme called temporal cost aggregation path to optimize the cost aggregation step. Based on the schemes, we proposed the DDR-SGM and the DDR-MCCNN algorithms for the stereo matching in video sequences. Evaluation results showed that the proposed algorithms significantly reduced the computational complexity with only very slight loss of accuracy. We proved that the proposed optimizations for the stereo matching are effective and the temporal consistency in stereo video is highly useful for either improving accuracy or reducing computational complexity.

  • Combining 3D Convolutional Neural Networks with Transfer Learning by Supervised Pre-Training for Facial Micro-Expression Recognition

    Ruicong ZHI  Hairui XU  Ming WAN  Tingting LI  

     
    PAPER-Pattern Recognition

      Pubricized:
    2019/01/29
      Vol:
    E102-D No:5
      Page(s):
    1054-1064

    Facial micro-expression is momentary and subtle facial reactions, and it is still challenging to automatically recognize facial micro-expression with high accuracy in practical applications. Extracting spatiotemporal features from facial image sequences is essential for facial micro-expression recognition. In this paper, we employed 3D Convolutional Neural Networks (3D-CNNs) for self-learning feature extraction to represent facial micro-expression effectively, since the 3D-CNNs could well extract the spatiotemporal features from facial image sequences. Moreover, transfer learning was utilized to deal with the problem of insufficient samples in the facial micro-expression database. We primarily pre-trained the 3D-CNNs on normal facial expression database Oulu-CASIA by supervised learning, then the pre-trained model was effectively transferred to the target domain, which was the facial micro-expression recognition task. The proposed method was evaluated on two available facial micro-expression datasets, i.e. CASME II and SMIC-HS. We obtained the overall accuracy of 97.6% on CASME II, and 97.4% on SMIC, which were 3.4% and 1.6% higher than the 3D-CNNs model without transfer learning, respectively. And the experimental results demonstrated that our method achieved superior performance compared to state-of-the-art methods.

  • Power Efficient Object Detector with an Event-Driven Camera for Moving Object Surveillance on an FPGA

    Masayuki SHIMODA  Shimpei SATO  Hiroki NAKAHARA  

     
    PAPER-Applications

      Pubricized:
    2019/02/27
      Vol:
    E102-D No:5
      Page(s):
    1020-1028

    We propose an object detector using a sliding window method for an event-driven camera which outputs a subtracted frame (usually a binary value) when changes are detected in captured images. Since sliding window skips unchanged portions of the output, the number of target object area candidates decreases dramatically, which means that our system operates faster and with lower power consumption than a system using a straightforward sliding window approach. Since the event-driven camera output consists of binary precision frames, an all binarized convolutional neural network (ABCNN) can be available, which means that it allows all convolutional layers to share the same binarized convolutional circuit, thereby reducing the area requirement. We implemented our proposed method on the Xilinx Inc. Zedboard and then evaluated it using the PETS 2009 dataset. The results showed that our system outperformed BCNN system from the viewpoint of detection performance, hardware requirement, and computation time. Also, we showed that FPGA is an ideal method for our system than mobile GPU. From these results, our proposed system is more suitable for the embedded systems based on stationary cameras (such as security cameras).

  • Multi Information Fusion Network for Saliency Quality Assessment

    Kai TAN  Qingbo WU  Fanman MENG  Linfeng XU  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2019/02/26
      Vol:
    E102-D No:5
      Page(s):
    1111-1114

    Saliency quality assessment aims at estimating the objective quality of a saliency map without access to the ground-truth. Existing works typically evaluate saliency quality by utilizing information from saliency maps to assess its compactness and closedness while ignoring the information from image content which can be used to assess the consistence and completeness of foreground. In this letter, we propose a novel multi-information fusion network to capture the information from both the saliency map and image content. The key idea is to introduce a siamese module to collect information from foreground and background, aiming to assess the consistence and completeness of foreground and the difference between foreground and background. Experiments demonstrate that by incorporating image content information, the performance of the proposed method is significantly boosted. Furthermore, we validate our method on two applications: saliency detection and segmentation. Our method is utilized to choose optimal saliency map from a set of candidate saliency maps, and the selected saliency map is feeded into an segmentation algorithm to generate a segmentation map. Experimental results verify the effectiveness of our method.

  • Efficient Dynamic Malware Analysis for Collecting HTTP Requests using Deep Learning

    Toshiki SHIBAHARA  Takeshi YAGI  Mitsuaki AKIYAMA  Daiki CHIBA  Kunio HATO  

     
    PAPER

      Pubricized:
    2019/02/01
      Vol:
    E102-D No:4
      Page(s):
    725-736

    Malware-infected hosts have typically been detected using network-based Intrusion Detection Systems on the basis of characteristic patterns of HTTP requests collected with dynamic malware analysis. Since attackers continuously modify malicious HTTP requests to evade detection, novel HTTP requests sent from new malware samples need to be exhaustively collected in order to maintain a high detection rate. However, analyzing all new malware samples for a long period is infeasible in a limited amount of time. Therefore, we propose a system for efficiently collecting HTTP requests with dynamic malware analysis. Specifically, our system analyzes a malware sample for a short period and then determines whether the analysis should be continued or suspended. Our system identifies malware samples whose analyses should be continued on the basis of the network behavior in their short-period analyses. To make an accurate determination, we focus on the fact that malware communications resemble natural language from the viewpoint of data structure. We apply the recursive neural network, which has recently exhibited high classification performance in the field of natural language processing, to our proposed system. In the evaluation with 42,856 malware samples, our proposed system collected 94% of novel HTTP requests and reduced analysis time by 82% in comparison with the system that continues all analyses.

  • A Top-N-Balanced Sequential Recommendation Based on Recurrent Network

    Zhenyu ZHAO  Ming ZHU  Yiqiang SHENG  Jinlin WANG  

     
    PAPER

      Pubricized:
    2019/01/10
      Vol:
    E102-D No:4
      Page(s):
    737-744

    To solve the low accuracy problem of the recommender system for long term users, in this paper, we propose a top-N-balanced sequential recommendation based on recurrent neural network. We postulated and verified that the interactions between users and items is time-dependent in the long term, but in the short term, it is time-independent. We balance the top-N recommendation and sequential recommendation to generate a better recommender list by improving the loss function and generation method. The experimental results demonstrate the effectiveness of our method. Compared with a state-of-the-art recommender algorithm, our method clearly improves the performance of the recommendation on hit rate. Besides the improvement of the basic performance, our method can also handle the cold start problem and supply new users with the same quality of service as the old users.

  • VHDL vs. SystemC: Design of Highly Parameterizable Artificial Neural Networks

    David ALEDO  Benjamin CARRION SCHAFER  Félix MORENO  

     
    PAPER-Computer System

      Pubricized:
    2018/11/29
      Vol:
    E102-D No:3
      Page(s):
    512-521

    This paper describes the advantages and disadvantages observed when describing complex parameterizable Artificial Neural Networks (ANNs) at the behavioral level using SystemC and at the Register Transfer Level (RTL) using VHDL. ANNs are complex to parameterize because they have a configurable number of layers, and each one of them has a unique configuration. This kind of structure makes ANNs, a priori, challenging to parameterize using Hardware Description Languages (HDL). Thus, it seems intuitively that ANNs would benefit from the raise in level of abstraction from RTL to behavioral level. This paper presents the results of implementing an ANN using both levels of abstractions. Results surprisingly show that VHDL leads to better results and allows a much higher degree of parameterization than SystemC. The implementation of these parameterizable ANNs are made open source and are freely available online. Finally, at the end of the paper we make some recommendation for future HLS tools to improve their parameterization capabilities.

  • Fast Lane Detection Based on Deep Convolutional Neural Network and Automatic Training Data Labeling

    Xun PAN  Harutoshi OGAI  

     
    PAPER-Image

      Vol:
    E102-A No:3
      Page(s):
    566-575

    Lane detection or road detection is one of the key features of autonomous driving. In computer vision area, it is still a very challenging target since there are various types of road scenarios which require a very high robustness of the algorithm. And considering the rather high speed of the vehicles, high efficiency is also a very important requirement for practicable application of autonomous driving. In this paper, we propose a deep convolution neural network based lane detection method, which consider the lane detection task as a pixel level segmentation of the lane markings. We also propose an automatic training data generating method, which can significantly reduce the effort of the training phase. Experiment proves that our method can achieve high accuracy for various road scenes in real-time.

  • Object Tracking by Unified Semantic Knowledge and Instance Features

    Suofei ZHANG  Bin KANG  Lin ZHOU  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2018/11/30
      Vol:
    E102-D No:3
      Page(s):
    680-683

    Instance features based deep learning methods prompt the performances of high speed object tracking systems by directly comparing target with its template during training and tracking. However, from the perspective of human vision system, prior knowledge of target also plays key role during the process of tracking. To integrate both semantic knowledge and instance features, we propose a convolutional network based object tracking framework to simultaneously output bounding boxes based on different prior knowledge as well as confidences of corresponding Assumptions. Experimental results show that our proposed approach retains both higher accuracy and efficiency than other leading methods on tracking tasks covering most daily objects.

  • Unsupervised Deep Domain Adaptation for Heterogeneous Defect Prediction

    Lina GONG  Shujuan JIANG  Qiao YU  Li JIANG  

     
    PAPER-Software Engineering

      Pubricized:
    2018/12/05
      Vol:
    E102-D No:3
      Page(s):
    537-549

    Heterogeneous defect prediction (HDP) is to detect the largest number of defective software modules in one project by using historical data collected from other projects with different metrics. However, these data can not be directly used because of different metrics set among projects. Meanwhile, software data have more non-defective instances than defective instances which may cause a significant bias towards defective instances. To completely solve these two restrictions, we propose unsupervised deep domain adaptation approach to build a HDP model. Specifically, we firstly map the data of source and target projects into a unified metric representation (UMR). Then, we design a simple neural network (SNN) model to deal with the heterogeneous and class-imbalanced problems in software defect prediction (SDP). In particular, our model introduces the Maximum Mean Discrepancy (MMD) as the distance between the source and target data to reduce the distribution mismatch, and use the cross-entropy loss function as the classification loss. Extensive experiments on 18 public projects from four datasets indicate that the proposed approach can build an effective prediction model for heterogeneous defect prediction (HDP) and outperforms the related competing approaches.

  • Millimeter-Wave InSAR Target Recognition with Deep Convolutional Neural Network

    Yilu MA  Yuehua LI  

     
    LETTER-Pattern Recognition

      Pubricized:
    2018/11/26
      Vol:
    E102-D No:3
      Page(s):
    655-658

    Target recognition in Millimeter-wave Interferometric Synthetic Aperture Radiometer (MMW InSAR) imaging is always a crucial task. However, the recognition performance of conventional algorithms degrades when facing unpredictable noise interference in practical scenarios and information-loss caused by inverse imaging processing of InSAR. These difficulties make it very necessary to develop general-purpose denoising techniques and robust feature extractors for InSAR target recognition. In this paper, we propose a denoising convolutional neural network (D-CNN) and demonstrate its advantage on MMW InSAR automatic target recognition problem. Instead of directly feeding the MMW InSAR image to the CNN, the proposed algorithm utilizes the visibility function samples as the input of the fully connected denoising layer and recasts the target recognition as a data-driven supervised learning task, which learns the robust feature representations from the space-frequency domain. Comparing with traditional methods which act on the MMW InSAR output images, the D-CNN will not be affected by information-loss accused by inverse imaging process. Furthermore, experimental results on the simulated MMW InSAR images dataset illustrate that the D-CNN has superior immunity to noise, and achieves an outstanding performance on the recognition task.

  • Rectifying Transformation Networks for Transformation-Invariant Representations with Power Law

    Chunxiao FAN  Yang LI  Lei TIAN  Yong LI  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2018/12/04
      Vol:
    E102-D No:3
      Page(s):
    675-679

    This letter proposes a representation learning framework of convolutional neural networks (Convnets) that aims to rectify and improve the feature representations learned by existing transformation-invariant methods. The existing methods usually encode feature representations invariant to a wide range of spatial transformations by augmenting input images or transforming intermediate layers. Unfortunately, simply transforming the intermediate feature maps may lead to unpredictable representations that are ineffective in describing the transformed features of the inputs. The reason is that the operations of convolution and geometric transformation are not exchangeable in most cases and so exchanging the two operations will yield the transformation error. The error may potentially harm the performance of the classification networks. Motivated by the fractal statistics of natural images, this letter proposes a rectifying transformation operator to minimize the error. The proposed method is differentiable and can be inserted into the convolutional architecture without making any modification to the optimization algorithm. We show that the rectified feature representations result in better classification performance on two benchmarks.

  • Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit

    Gaofeng CHENG  Pengyuan ZHANG  Ji XU  

     
    PAPER-Speech and Hearing

      Pubricized:
    2018/11/19
      Vol:
    E102-D No:2
      Page(s):
    355-363

    The long short-term memory recurrent neural network (LSTM) has achieved tremendous success for automatic speech recognition (ASR). However, the complicated gating mechanism of LSTM introduces a massive computational cost and limits the application of LSTM in some scenarios. In this paper, we describe our work on accelerating the decoding speed and improving the decoding accuracy. First, we propose an architecture, which is called Projected Gated Recurrent Unit (PGRU), for ASR tasks, and show that the PGRU can consistently outperform the standard GRU. Second, to improve the PGRU generalization, particularly on large-scale ASR tasks, we propose the Output-gate PGRU (OPGRU). In addition, the time delay neural network (TDNN) and normalization methods are found beneficial for OPGRU. In this paper, we apply the OPGRU for both the acoustic model and recurrent neural network language model (RNN-LM). Finally, we evaluate the PGRU on the total Eval2000 / RT03 test sets, and the proposed OPGRU single ASR system achieves 0.9% / 0.9% absolute (8.2% / 8.6% relative) reduction in word error rate (WER) compared to our previous best LSTM single ASR system. Furthermore, the OPGRU ASR system achieves significant speed-up on both acoustic model and language model rescoring.

  • Parallel Feature Network For Saliency Detection

    Zheng FANG  Tieyong CAO  Jibin YANG  Meng SUN  

     
    LETTER-Image

      Vol:
    E102-A No:2
      Page(s):
    480-485

    Saliency detection is widely used in many vision tasks like image retrieval, compression and person re-identification. The deep-learning methods have got great results but most of them focused more on the performance ignored the efficiency of models, which were hard to transplant into other applications. So how to design a efficient model has became the main problem. In this letter, we propose parallel feature network, a saliency model which is built on convolution neural network (CNN) by a parallel method. Parallel dilation blocks are first used to extract features from different layers of CNN, then a parallel upsampling structure is adopted to upsample feature maps. Finally saliency maps are obtained by fusing summations and concatenations of feature maps. Our final model built on VGG-16 is much smaller and faster than existing saliency models and also achieves state-of-the-art performance.

221-240hit(855hit)